Snowpack variability in western North America over the common era

Models and Observations

Nick Gauthier

Last knit on: 2019-11-20

Analysis of simulated and observed snow water equivalent (SWE) dynamics in western North America over the Common Era.

Setup

Import the packages required for this analysis.

library(raster) # processing raster data
library(tidyverse) # data manipulation and visualization
library(mgcv) # flexible nonlinear regression models
library(furrr) # parallel processing
library(sf) # shapefiles and plotting
library(gganimate) # animated gifs
library(ggridges) # ridge plots
library(broom) # tidying model objects

Define a study area to constrain all computaitons.

bbox <- extent(c(-125, -104, 33, 50))

Observations

Import the snow observation data from https://nsidc.org/data/nsidc-0719.1 What’s up with this warning message? long_name=CRS definition spatial_ref=GEOGCS[“NAD83”,DATUM[“North_American_Datum_1983”,SPHEROID[“GRS 1980”,6378137,298.257222101,AUTHORITY[“EPSG”,“7019”]],AUTHORITY[“EPSG”,“6269”]],PRIMEM[“Greenwich”,0,AUTHORITY[“EPSG”,“8901”]],UNIT[“degree”,0.01745329251994328,AUTHORITY[“EPSG”,“9122”]],AUTHORITY[“EPSG”,“4269”]] GeoTransform=-125.0208 0.04166662697178698 0 49.9375 0 -0.04166662697178698. First define a function to extract the map for April 1st from each annual file, then run it on the SWE and snow depth data.

preprocess <- function(x, var, flip = FALSE, regrid = FALSE) {
  maps <- brick(x, varname = var)

  indices <- getZ(maps) %>% # find the index for April 1st
    str_detect('-04-01') %>% 
    which()
  
  subset(maps, indices) %>%
    {if(flip) rotate(.) else .} %>%
    crop(bbox) %>%
    {if(regrid) aggregate(., fact = 2, na.rm = TRUE) else .} # resample to lower res to speed up analysis
}
plan(multisession) # setup parallel processing

# import SWE data
swe_obs <- list.files('data', pattern = 'SWE_Depth', full.names = TRUE) %>%
  future_map(preprocess, var = 'SWE', regrid = TRUE) %>%
  brick()

# import depth data
depth_obs <- list.files('data', pattern = 'SWE_Depth', full.names = TRUE) %>%
  future_map(preprocess, var = 'DEPTH', regrid = TRUE) %>%
  brick()

Import SRTM elevation and resample to the resolution of the observations. Also calculate the area of each grid cell.

elev <- raster('~/gdrive/Data/SRTM_1km.tif') %>%
  crop(bbox) %>%
  resample(swe_obs) %>%
  brick(., area(.))

Turn all the rasters into data frames and join.

swe_dat <- swe_obs %>%
  as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
  mutate(year = round(parse_number(layer))) %>%
  rename(SWE = value) %>%
  select(-layer)

depth_dat <- depth_obs %>%
  as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
  mutate(year = round(parse_number(layer))) %>%
  rename(DEPTH = value) %>%
  select(-layer)

elev_dat <- elev %>%
  as.data.frame(xy = TRUE, na.rm = TRUE) %>%
  rename(elev = SRTM_1km, area = layer)

snow_dat <- swe_dat %>%
  left_join(depth_dat, by = c('x', 'y', 'year')) %>%
  left_join(elev_dat, by = c('x', 'y'))

Look at how SWE has varied over time and space.

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Some aggregate statistics. Is this the right way to aggregate?

snow_agg <- snow_dat %>%
  filter(SWE > 0) %>%
  group_by(year) %>%
  summarise(SWE = sum(SWE * area), # units in megaliters (?)
            area = sum(area))
How has total SWE varied over the observational period?

What about total area covered?

CESM-LME Comparison

cesm_h2osno <- preprocess('~/Downloads/b.e11.BLMTRC5CN.f19_g16.001.clm2.h1.H2OSNO.08500101-18491231.nc',
                          var = 'H2OSNO',
                          flip = TRUE)
## Loading required namespace: ncdf4
cesm_h2osno_ext <- preprocess('~/Downloads/b.e11.BLMTRC5CN.f19_g16.001.clm2.h1.H2OSNO.18500101-20051231.nc',
                          var = 'H2OSNO',
                          flip = TRUE)

cesm_areas <- area(cesm_h2osno) %>%
  as.data.frame(xy = TRUE, na.rm = TRUE) %>%
  rename(area = layer)

still need to account for snow fraction of grid cell? and land fraction too?

cesm_dat <- c(cesm_h2osno, cesm_h2osno_ext) %>%
  brick() %>%
  as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
  mutate(year = str_sub(layer, 2) %>% parse_number() %>% round) %>%
  rename(SWE = value) %>%
  select(-layer) %>%
  left_join(cesm_areas, by = c('x', 'y'))

cesm_agg <- cesm_dat %>%
  group_by(year) %>%
  summarise(SWE = sum(SWE * area, na.rm = TRUE)) %>%
  mutate(avg = zoo::rollmean(SWE, k = 100, na.pad = TRUE))

So there’s a clear bias here. As we see below, the mean of the observations is well higher than it should be. Looks like the simulation underpredicts snow – likely because of all the small-scale topographic influences that CESM can’t resolve.

## Picking joint bandwidth of 2.1e+07

Let’s compare the observations and simulation directly.

What’s the climatic mean April 1st SWE for the observations and simulations?

To what extent is this mistmatch due to bias or scale effects? Test by resampling the observations to the CESM LME grid.

This comparison suggests that CESM is underpredicting SWE because of the simplified topography. Check this by looking at the underlying topography boundary files for that simulation.

surface_geop <- raster('data/consistent-topo-fv1.9x2.5_c130424.nc', 
                       var = 'PHIS') %>%
  rotate() %>%
  crop(bbox, snap = 'out') %>%
  `/`(9.80665) %>%
  as.data.frame(na.rm = TRUE, xy = TRUE) 

landfrac <- raster('data/consistent-topo-fv1.9x2.5_c130424.nc', 
                   var = 'LANDFRAC') %>%
  rotate() %>%
  crop(bbox, snap = 'out')

Might be interesting to look at snow centroid date too down the road. Spatial patterns notwithstanding, maybe the aggregate temporal patterns are still recoverable?

Empirical Orthogonal Functions

Now run a principal components analysis on the observed SPEI data. Much of the code that follows is adapted from the wql and sinkr packages.

obs_pca <- swe_dat %>%
  group_by(x, y) %>%
  mutate(test = all(SWE == 0)) %>%
  filter(test == FALSE) %>%
  ungroup() %>%
  select(-test)%>%
  spread(year, SWE) %>%
  select(-x, -y) %>%
  t() %>%
  prcomp(scale. = TRUE)

sim_pca <- cesm_dat %>%
  filter(year > 1900) %>%
  group_by(x, y) %>%
  mutate(test = all(SWE == 0)) %>%
  filter(test == FALSE) %>%
  ungroup() %>%
  select(-test) %>%
  spread(year, SWE) %>%
  select(-x, -y, -area) %>%
  t() %>%
  prcomp(scale. = TRUE)

eofs_obs <- obs_pca %>% # calculate unrotated EOFs
  tidy(matrix = 'variables') %>%
  filter(PC <= 2) %>%
  left_join(obs_eigs[1:2], by = 'PC') %>%
  mutate(eof = value * std.dev,
         PC = as.character(PC)) %>%
  select(-std.dev,-value) 

eofs_sim <- sim_pca %>% # calculate unrotated EOFs
  tidy(matrix = 'variables') %>%
  filter(PC <= 4) %>%
  left_join(sim_eigs[1:2], by = 'PC') %>%
  mutate(eof = value * std.dev,
         PC = as.character(PC)) %>%
  select(-std.dev,-value) 
## Joining, by = "column"

## Joining, by = "column"

Reanalysis

So the question now is whether the mismatch between observed and simulated SWE is due to the meteorolgical forcing. That is, if only we had better temperature and precipitation simulations then we could model SWE as a simple function of those. To examine this, use reanalysis data.

Start by importing ERA5 data.

ncdf_file <- 'data/adaptor.mars.internal-1566933160.7965047-18657-23-b3ae7b43-15db-4890-bc5b-4d3413fe4ddd.nc'
ref <- brick(ncdf_file, var = 'sd') %>% 
  .[[1]]%>%
  rotate %>%
  crop(bbox)

elev2 <- resample(elev[[1]], ref)

We need to use snowdepth rather than SWE, but as shown above this isn’t a huge deal in the long run. Here just import April snowdepths, which we’ll use as a rough equivalent for April 1st SWE.

snowdepth <- brick(ncdf_file, var = 'sd') %>% 
  .[[seq(4,484, 12)]] %>% # get april
  rotate %>%
  crop(bbox) %>%
  mask(elev2) %>%
  as.data.frame(xy = TRUE, long = TRUE, na.rm = TRUE) %>%
  rename(time = layer, sd = value)

Also import temperature and precipiation for all months.

temp <- brick(ncdf_file, var = 't2m') %>% 
  rotate %>%
  crop(bbox) %>%
  mask(elev2) %>%
  as.data.frame(xy = TRUE, long = TRUE, na.rm = TRUE) %>%
  rename(time = Z, t2m = value)

precip <- brick(ncdf_file, var = 'tp') %>% 
  rotate %>%
  crop(bbox) %>%
  mask(elev2) %>%
  as.data.frame(xy = TRUE, long = TRUE, na.rm = TRUE) %>%
  rename(time = Z, tp = value)

Calculate the mean temperature and total precipitation for Oct-Mar to see if there’s any relationship to snowdepth.

dat <- precip %>%
  left_join(temp, by = c('x', 'y', 'time')) %>%
  mutate(t2m = t2m - 273.15) %>%
  separate(time, into = c('year', 'month', 'day'), convert = TRUE) %>%
  mutate(water_year = if_else(month < 10, year, year + 1L)) %>%
  filter(between(water_year, 1980, 2018)) %>%
  filter(month %in% c(1:3, 9:12)) %>%
  group_by(x, y, water_year) %>%
  summarise(t2m = mean(t2m), tp = sum(tp))

Combine all the data.

dat2 <- snowdepth %>%
  mutate(time = str_sub(time, start = 2)) %>%
  separate(time, into = c('year', 'month', 'day'), convert = TRUE) %>%
  select(-day, -month) %>%
  left_join(dat, by = c('x', 'y', 'year' = 'water_year')) %>%
  remove_missing()
## Warning: Removed 10458 rows containing missing values.

The spatial and temporal variability of the reanalysis snow depth looks close to observations, so that’s a good sign.

Fit a GAM predicting snow depth as a nonlinear interaction between seasonal temperature and precipitation. Include terms to account for spatial and temporal autocorrelation

mod <- dat2 %>%
  mutate(start = if_else(year == min(year), TRUE, FALSE)) %>%
  arrange(x, y) %>%
  bam(sd ~ te(t2m, tp) + te(x,y), data = ., AR.start = start)

The model performs suprisingly well.

summary(mod)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## sd ~ te(t2m, tp) + te(x, y)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.445e-02  9.412e-05     366   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df       F p-value    
## te(t2m,tp) 23.82  23.98 18310.5  <2e-16 ***
## te(x,y)    23.81  23.99   462.9  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.758   Deviance explained = 75.8%
## fREML = -3.5493e+05  Scale est. = 0.0017987  n = 203931
plot(mod, scheme = 1)

Plotting the results, it does indeed look like we can predict SWE/snow depth, or at least climatological SWE, from temperature and precipitation data.

Other Analyses

Trend Analysis

Fit a geographically weighted regression to look at spatially varying SWE trends in the observations.

m1 <- bam(SWE ~ te(x, y, by = year, k = 20), data = snow_dat)

The model’s fine … not good, not great.

summary(m1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## SWE ~ te(x, y, by = year, k = 20)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1056.66      19.45   54.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df    F p-value    
## te(x,y):year 382  383.8 3717  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.475   Deviance explained = 47.5%
## fREML = 9.8714e+06  Scale est. = 16088     n = 1576188

Looks good down here, but these are in absolute numbers. Maybe rescale to look at relative change?

Snow Density

Let’s model the relationship between snow depth and snow water equivalent (i.e. snow density) in the observations. Looks like its roughly linear.

snow_dat %>%
  ggplot(aes(DEPTH, SWE)) +
  geom_point() +
  theme_minimal()

m2 <- bam(SWE ~ s(DEPTH), data = snow_dat, discrete = TRUE)

The GAM explains 98% of the deviance, so not bad at all.

plot(m2)

summary(m2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## SWE ~ s(DEPTH)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1165.0027     0.3417    3409   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df       F p-value    
## s(DEPTH) 8.824  8.981 8550156  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.98   Deviance explained =   98%
## fREML = 7.2993e+06  Scale est. = 616.49    n = 1576188

The residuals from the (roughly) linaer fit between SWE and depth show some interesting structure. Might be worthwhile to explore later.

Temporal Autocorrelation

What about the temporal autocorrelation in the observations and simulations?